This report present the computational results for the instances. All instances are minimized, i.e. if an objective function \(z(x)\) is maximized, we minimise \(-z(x)\) instead. Note that in all instances only binary variables are considered.

Instances

We consider a total of 1297 instances named

Forget20_[problemClass]_[n]_[p]_[rangeOfCosts]_[costGenerationMethod]_[constaintId]_[id].raw

where

The cost generation method for generating the coefficients of the objective functions are (given in column coef):

For further details see the documentation function genSample in the R package gMOIP.

More detailed results for each instance can be seen in the files named [instanceName]_results.html (see for instance this example).

Input and output statistics

A set of statistics are stored for each instance and algorithm run. The statistics are stored in a json result file (v1.0) for each instance and algorithm configuration named [Instance name]_[Algorithm config]_result.json. Most statistics are aggregated in file stat.csv.

Input statistics

For each instance we have the following statistics:

We have:

  • pb: Problem class
    • KP = Knapsack Problem
    • AP = Assignment Problem
    • UFLP = Uncapacited Facility Location Problem
  • n: Number of variables.
  • p: Number of objectives.
  • rangeC: the range the objective coefficients are generated within. For the facility location problems, two sets of costs are generated: the assignment costs (\(c_{ij}\)) and the facility opening costs (\(f_j\)). It is reasonable to assume that these two sets of costs may not take their values in the same range. In this case rangeC defines two ranges: the assignment costs in \([a,b]\) and the facility opening costs which will be divided into two categories of instances:
    • facility location easy: \(f_j \in [b+a,b+b]\). It is called “easy” because it seems that having high facility opening costs makes the problem easier.
    • facility location hard: \(f_j \in [0.1a,0.1b]\). It is called “hard” because it seems that having small facility opening costs makes the problem harder.
  • coef: The method used for the generation of the objective coefficients (see section Research questions). We have:
    • random = random coefficient generation
    • 2box = two-boxes generation
    • spheredown = generation on the lower part of a sphere (in minimisation)
    • sphereup = generation on the upper part of a sphere (in minimisation)
  • ratioNDcoef: Proportion of objective coefficient vectors (here considered as a vector in \(\mathbb{R}^p\), one for each variable, defining the objective coefficient for all the objective for this variable) that are non-dominated (with respect to the other objective coefficient vectors). Examples of the four ways of generating the objective coefficients are given:

Random

The coefficient are generated randomly with a uniform distribution in the range \([a,b]\). For three objectives, random generated coefficients looks as follows.

You must enable Javascript to view this page properly.

Sphere down

The coefficients are generated on the lower part of a sphere (see next picture). Note that the sphere is adjusted such that the coefficients are in the range \([a,b]\), i.e. the sphere is not necessarily included in \([a,b]^p\).

You must enable Javascript to view this page properly.

Sphere up

The coefficients are generated on the upper part of a sphere (see next picture). Note that the sphere is adjusted such that the coefficients are in the range \([a,b]\), i.e. the sphere is not necessarily included in \([a,b]^p\).

You must enable Javascript to view this page properly.

2box

The coefficients are generated randomly but in two specific parts of \([a,b]^p\) (see next picture).

You must enable Javascript to view this page properly.

Algorithm configuration

The algorithm is configurated using:

The branch and bound algorithm will always use the linear relaxation as lower bound set and the incumbent set as upper bound set. At each nodes, the extreme points of the lower bound set are checked for integrality and possibly added to the upper bound set. Different configurations are bases on:

  • nodesel: node selection strategy.
  • varsel: variable selection strategy.
  • OB: objective branching strategy.

Combinations of nodesel, varsel and OB gives the configuration of the algorithm:

  • Node selection: in the multi-objective branch and bound literature, depth first strategies are (almost) always used and are considered as better strategies than breadth first. However, if a problem with an “easy” single-objective version is solved, then many non-dominated points may be found in a node close to the root node, in the early stages of the tree. Hence, using a breadth first strategy may provide a better upper bound set earlier in the algorithm, as a larger number of points are expected to be found shallow in the tree while only a few points can be found at each node deep in the tree.

  • Variable selection: to the best of our knowledge, no extensive study for variable selection has been conducted in the literature. Sometimes, this component is even not mentioned. As it is not the main purpose of this study, only two rules that rely on the caracteristics of the lower bound set will be tested here.

    • mof: Most Often Fractional. The branching is operated on the variable that is the most often fractional among the extreme points of the lower bound set.

    • mfavg: Most Fractional in AVeraGe. The branching is operated on the variable such that its average value among the extreme points of the lower bound set is the most fractional, i.e. the closest to 0.5.

  • Branching scheme: in a regular branch and bound, branching is operated (i.e. sub-problems are created) in the decision space. In the bi-objective case, it has been shown that creating additional sub-problems in the objective space (procedure called objective branching in this study) leads to better computational times. Three versions of the branch and bound will be tested:

    • None: no objective branching is performed. Hence, this version is just a regular branch and bound.

    • exact: objective branching is performed using the algorithm from [master thesis paper].

    • unique cone: a unique sub-problem is created at each node. In this case, objective branching is applied on the nadir point of the local upper bounds dominated by the lower bound set. See [master thesis paper] for more details.

Output statistics

For each algorithm run we have the following statistics.

  • solved: 1 if the instance is solved within 3600 sec, 0 otherwise.
  • YN: size of YN. If solved = 0, it represent the size of the upper bound set at 3600 sec, when the algorithm stops.
  • nbnodes: number of nodes explored.
  • mindepthT: minimal depth of a leaf node.
  • maxdepthT: maximal depth of a leaf node (and thus of the tree).
  • avgdepthT: average depth of the leaf nodes.
  • avgdepthYN: average depth of the nodes where the non-dominated points were found.
  • nbleaf: number of leaf nodes.
  • nbinfeas: number of nodes pruned by infeasibility.
  • pctinfeas: proportion (in %) of leaf nodes pruned by infeasibility.
  • tpsinfeas: average time spend to prune a node by infeasibility (in msec).
  • nbopt: number of nodes pruned by optimality.
  • pctopt: proportion (in %) of leaf nodes pruned by optimality.
  • tpsopt: average time spend to prune a node by optimality (in msec).
  • nbdomi: number of nodes pruned by dominance.
  • pctdomi: proportion (in %) of leaf nodes pruned by dominance.
  • avgdomi: average time spend to prune a node by dominance (in msec).
  • nbLB: number of lower bound set computed.
  • avgfacets: average number of facets in the lower bound set (i.e. in \(\mathcal{L} + \mathbb{R}^p\)).
  • avgNDf: average number of strictly non-dominated facets.
  • pctavgNDf: proportion (in %) of facets that are strictly non-dominated.
  • avgWNDf: average number of weekly non-dominated facets.
  • pctavgWNDf: proportion (in %) of facets that are weekly non-dominated.
  • maxfacets: maximal number of facets a lower bound set had in the tree.
  • maxNDf: number of strictly non-dominated facets in the lower bound set with the maximal number of facets.
  • pctmaxNDf: proportion (in %) of facets that are strictly non-dominated in the lower bound set with the maximal number of facets.
  • maxWNDf: number of weekly non-dominated facets in the lower bound set with the maximal number of facets.
  • pctmaxWNDf: proportion (in %) of facets that are weekly non-dominated in the lower bound set with the maximal number of facets.
  • tpstotal: CPU time (in sec) used to solve the instance. 3600 if the instance is not solved.
  • tpsLB: CPU time (in sec) used to compute lower bound sets.
  • pcttpsLB: proportion (in %) of the total CPU time spend in the computation of lower bound sets.
  • tpsdomi: CPU time (in sec) used to dominance test when the algorithm has to determine whether a node can be pruned by dominance or not.
  • pcttpsdomi: proportion (in %) of the total CPU time spend in the dominance test when the algorithm has to determine whether a node can be pruned by dominance or not.
  • tpsUB: CPU time (in sec) used to update the upper bound set.
  • pcttpsUB: proportion (in %) of the total CPU time spend in updating the upper bound set.
  • tpsnodesel: CPU time (in sec) used to choose the next node to develop.
  • pcttpsnodesel: proportion (in %) of the total CPU time spend in choosing the next node to develop.
  • tpsvarsel: CPU time (in sec) used to choose the variable to branch on.
  • pcttpsvarsel: proportion (in %) of the total CPU time spend in choosing the variable to branch on.
  • tpsOB: CPU time (in sec) used to create the sub-problems in the objective space, i.e. to compute objective branching. (/! it requires two different steps in total: computing the SLUBs but also do additional dominance test to determine the dominance status of each local upper bounds ! This number take into account the two steps.)
  • pcttpsOB: proportion (in %) of the total CPU time spend in computing objective branching.
  • tpsSLUB: CPU time (in sec) used to compute the super local upper bounds.
  • pcttpsSLUB: proportion (in %) of the total CPU time spend in computing the super local upper bounds.
  • tpsdomiLUB: CPU time (in sec) used to do the additionnal dominance tests to get the dominance status of EACH local upper bound.
  • pcttpsdomiLUB: proportion (in %) of the total CPU time spend in doing the additionnal dominance tests on the local upper bounds.
  • nbOB: number of nodes where two or more sub-problems are created in the objective space. When using the exact objective branching (OB = exact), it in particular shows how often it is actually possible to split the objective space with the definition of the sub-problems that we used (\(z(x) \leqq \bar{z}\), \(\bar{z} \in \mathbb{R}^p\)).
  • pctnbOB: proportion (in %) of the nodes explored that where split in two or more sub-problems in the objective space.
  • avgdepthOB: [relevant only if OB = exact] average depth of the nodes split in two or more sub-problems in the objective space.
  • mindepthOB: [relevant only if OB = exact] minimal depth of the nodes split in two or more sub-problems in the objective space.
  • maxdepthOB: [relevant only if OB = exact] maximal depth of the nodes split in two or more sub-problems in the objective space.
  • avgnbpbOB: [relevant only if OB = exact] average number of sub-problems created in the objective space in the nodes split in two or more sub-problems in the objective space.
  • maxnbpbOB: [relevant only if OB = exact] average number of sub-problems created in the objective space in the nodes split in two or more sub-problems in the objective space.

For each instance we also store the non-dominated set \(\mathcal{Y}_N\) found by the algorithm and if an exact solution was found the efficiet set \(\mathcal{X}_E\). Statistics of how the non-domnated points are found are stored in yNStat. The statistics are the following:

  • the \(p\) first columns correspond the values of the objective functions.
  • node: number of the node where this point has been discovered. The higher this number is, the later the point has been discovered.
  • time: time (in sec) elapsed between the start of the algorithm and when this point has been found (for the first time).
  • depth: depth of the node where this point has been found (for the first time).

yNStat are sorted in exactly the same order as \(\mathcal{X}_E\), i.e. each row represent the non-dominated point and its corresponding solution.

Detailed results for each instance

Detailed results for each instance can be generated using instance.Rmd. The report is already generated for some of the instances (see links in the table with input statistics). Some instances that might be of interest:

Research questions

We first consider

  1. How do input statistics affect the number of nondominated solutions?
    1. Does the cost range have an effect?
    2. Does the generation method of the objective coefficients (coef) have an effect?
    3. Does the proportion of objective coefficient vectors that are non-dominated (ratioNDcoef) have an effect?
    4. Can we exclude some values of the input statistics for further study (e.g. some ranges)?
  2. Which instances are hard to solve?
    1. What is the cpu for each instance?
    2. Which sphere generation instances are hardest to solve?
    3. What is the std.dev. within each instance group?
    4. In which subprocedures are the cpu time used?

Next, we study how some components of the branch and bound influences the behaviour of the algorithm:

  1. How does depth and breadth first strategies affects the behaviour of the algorithm? That is, we consider nodesel for fixed varsel and OB.
    1. Are some problem classes solved best with one node selection strategy compared to others?
    2. Are ‘easy’ problems solved best with one node selection strategy compared to others?
    3. Does different OB strategies affect the node selection strategy?

The primary focus is on computational time and number of nodes explored the size of the brancing tree before the algorithm ends.

Analysis: Research Question 1 (R1) - How do input statistics affect the number of nondominated solutions?

In this section, relations between |YN| (YN) and various input statistics. How do input statistics affect the number of nondominated solutions?

R1a: Does the cost range have an effect?

Based on the plots we may answer the research question by concluding that range have an effect. A small cofficient range gives a smaller range of possible objective values and hence fewer possible nondominated points. Note a smaller range may result in a larger amount of alternative solutions corresponding to a nondominted point. The speed of the algorithm may be affected by alternative optimal solutions.

For KP and AP scaling the cost don’t affect the size of nondominated points. For UFLP there is an effect due to the way costs are generated so here scaling don’t give the same results (we have to choose a setting here). [Missing comments for IP]

We may conclude that further study only consider range [1,1000] and the two cases for the UFLP with this range.

Note that the number of nondominated points increase with the number of variables which is due to the total possible range of the objectives increase with n (there is room for more points).

R1b: Does the generation method of the objective coefficients (coef) have an effect?

Based on the plots we may answer the research question by concluding that the objective cofficient generation method have a high effect on the number of solutions. Random generation on average produce the lowest amount, then 2box. The two sphere methods seems to give approx. the same results.

Note in general the speed of the algorithm is higly correlated with the number of non-dominated points. That is, instances with a high number of nondominated points are harder to solve. Based on this we may conclude that using one of the sphere methods for further study seems a good choice. We may also consider comparing it against the easiest method.

R1c: Does the proportion of objective coefficient vectors that are non-dominated (ratioNDcoef) have an effect?

This is highly related to the objective cofficient generation method (and the range):

Note for the fixed pb and rangeC the ratio approx. increase in the order random, 2box, spheredown and sphereup (with random and 2box (spheredown and sphereup) often close to each other). However it is important to keep in mind that the ratio is highly dependent on the problem class (how variables are linked together). The UFLP have much smaller ratios due to that the costs relate to different aspects of the problem.

Based on the plot we may answer the research question by concluding that the ratio may have a high effect on the number of non-dominated solutions. Given a fixed problem class and objective coefficient range, the ratio may have a high effect on the dificulty of the instance. Note however, that if problem class and range not is known you can not guess the number of ND points.

R1d: Can we exclude some values of the input statistics for further study (e.g. some ranges)?

The above analysis may conclude that instances with a lower number of nondominated points can be generated by

  1. Use a narrow cost range.
  2. Use random cost generation which result in a low ratio.

To have harder instances we may skip these instances and instead use instances with a higher number of nondominated points by

  1. Use a wide cost range (e.g. [1,1000]).
  2. Use a sphere generation method.

Scaling the range don’t have a high effect on most problem classes so fix a cost range seems okay. If we want to compare agains easy problems we may use the first option too.

Analysis: Research Question 2 (R2) - Which instances are hard to solve?

We focus on cpu time (tpstotal) and try to get an overview over difficulty of the instances.

R2a: What is the cpu for each instance?

First let us have a general look:

The general picture is fuzzy due to different problem classes, ranges, generation methods and algorithm configurations. Let us have a closer look:

Some observations

  • There is no clear winner among the different algorithm configurations.
  • There seems to be a bigger difference for KP.
  • Hardest instances are the one generated with the sphere methods.

Let us find the winner configuration for each instance given some groupings:

Overall breadth methods win.

Note if we look at different problem classes different variable selection methods win.

If we group by coef, we have different winners. This may have something with the hardness of the instance.

Will different methods be used for harder instances (here we use YN as a proxy for hard). Let us try to plot the aggegated percentage winner configuration:

It can be seen that for some problems the best configuration change from easy problems to hard.

Finally we may plot only the winner cpu times for each instance:

R2b: Which sphere generation instances are hardest to solve?

WE NEED INFO ABOUT SUPPORTED VS UNSUPPORTED

R2c: Is the cpu time the same within each instance group?

By instance group we mean an unique combination of namePrefix, constId, pb, n, coef, rangeC, nodesel, varsel, OB

In general we may have high variation within instance groups.

R2d: In which subprocedures are the cpu time used?